Back to main menu

library(DBI)
library(sf)
library(dplyr)
library(gimms)
library(raster)
library(ncdf4)
library(rgdal)
library(spdep)

library(stringr)
library(ggplot2)
library(cowplot)
library(spdep)
library(tibble)
library(RColorBrewer)
library(colorspace)

library(kableExtra)

library(gridExtra)
library(grid)
library(png)

source("f_plotspatial.R")
memory.limit(size = 160000)
## [1] 160000
compute.effect.size <- function(df.coef){

df.es <- df.coef[,1:3]
n <- length(df.es$Parameter)
m <- df.es$Parameter


for(i in 1:n){
  if(grepl("log",m[i])==T){
    df.es[i,2] <- 1.01^(df.coef[i,2])
    df.es[i,3] <- 1.01^(df.coef[i,2])*log(1.01)*df.coef[i,3]
  }else if(m[i] %in% c("NDVI","Bog","Arable","Forest","Temp")){
    df.es[i,2] <- exp(df.coef[i,2]*0.01)
    df.es[i,3] <- 0.01*df.coef[i,2]*df.coef[i,3]
  }else if(m[i] %in% c("TNdep")){
    df.es[i,2] <- exp(df.coef[i,2]*25)
    df.es[i,3] <- 25*exp(df.coef[i,2]*25)*df.coef[i,3]
  }else if(m[i] %in% c("TSdep")){
    df.es[i,2] <- exp(df.coef[i,2]*11)
    df.es[i,3] <- 11*exp(df.coef[i,2]*11)*df.coef[i,3]
  }else {
    df.es[i,2] <- exp(df.coef) # effect size for temperature???
  }
}

df.es$Percent <- 100*(df.es$Estimate-1)

return(df.es)

}

# other es
eslog <- 1.01
es <- 0.01
library(captioner)
fig_nums <- captioner() 
table_nums <- captioner(prefix = "Table")

1 Catchment data for Norway, Sweden and Finland

1.1 Catchment polygons

The catchment and TOC data are stored on the PostgreSQL database ecco_biwa, accessed through Rstudio using the st_read function in the sf package (Pebesma 2021) and the dplyr package (Wickham et al. 2020). Each catchment polygon is linked to the corresponding TOC concentration via an “ebint” (identifyer), internal to the ecco_biwa database.

The catchment polygons were designed by an elevation model and are assigned to the studied lake based on the distance to the sampling coordinates.

con <- dbConnect(RPostgreSQL::PostgreSQL(),user = "camille.crapart", password = "camille",host = "vm-srv-wallace.vm.ntnu.no", dbname = "nofa")
#catchment.geom <- tbl(con,sql("SELECT ebint, geom FROM catchments.lake_catchments"))

ebint.tbl <- tbl(con,sql("SELECT ebint FROM catchments.lake_catchments")) 
ebint.catch <- pull(ebint.tbl,ebint)

ebint.waterchem <- tbl(con,sql("SELECT ebint,nation FROM environmental.north_euro_lake_surv_1995")) %>% tbl_df()

ebint <- intersect(ebint.waterchem$ebint, ebint.catch)

country <- filter(ebint.waterchem, ebint %in% ebint)
saveRDS(country,"country.Rdata")
catchment.poly.1000 <- st_read(con,query = "SELECT ebint, geom FROM catchments.lake_catchments WHERE ebint IN (SELECT ebint FROM environmental.north_euro_lake_surv_1995)")
saveRDS(catchment.poly.1000,"catchment.poly.1000.Rdata")

The catchment polygons were reprojected in WGS84 to match the satellite data that will be extracted later, and in EPSG:3035 (European projection) to match the CORINE land cover data.

catchment.poly <- st_transform(catchment.poly.1000, CRS("EPSG:4326"))
saveRDS(catchment.poly,"catchment.poly.Rdata")
catchment.poly.corine <- st_transform(catchment.poly.1000, CRS("EPSG:3035"))
saveRDS(catchment.poly.corine, "Fennoscandia_NTNU/catchment.poly.Rdata")
st_write(catchment.poly,"catchment.poly.shp")

1.2 TOC

The TOC data was measured after the Northern Lake Survey 1995 (Henriksen et al. 1998). It was stored in the ecco_biwa database (https://github.com/andersfi/BiWa_ECCO).

toc.tbl <- tbl(con,sql("SELECT ebint,toc_mg_l,longitude,latitude,dist_closest_ebint,dist_2nd_closest_ebint FROM environmental.north_euro_lake_surv_1995"))
toc.df <- as.data.frame(toc.tbl)
saveRDS(toc.df,"toc.df.Rdata")

1.3 NDVI

NDVI values are extracted from the GIMMS NDVI3g dataset (The National Center for Atmospheric Research 2018), stored on http://poles.tpdc.ac.cn/en/data/9775f2b4-7370-4e5e-a537-3482c9a83d88/, in the “ecocast” dataset and accessed via the “gimms” package (Detsch 2021) on Rstudio.

Data is taken bi-monthly (Tucker et al. 2005). Raster layers of all slices are downloaded using the downloadGimms function, then monthly composites are calculated using the monthlyComposites function. The maximum NDVI value is taken for each pixel. Afterwards the NDVI values for each catchment polygon are extracted using the extract function from the raster package (Hijmans et al. 2018).

The values stored in the ecocast dataset are composed of the NDVI value and a flag value indicating the goodness of the data. All the NDVI values extracted, corresponding to the values for the studied catchments, had a flag of 1, indicating a good value. The NDVI value was retrieved from the stored value using the formula \(floor(ndvi3g/10)/1000\) (Detsch 2021), and then the mean of the 3 summer values (June, July and August) was computed for each catchment.


# Download Gimms NDVI

ndvi.1994 <- downloadGimms(x= 1994,y=1994,dsn = "NDVI")
ndvi.max <- monthlyComposite(ndvi.1994,monthlyIndices(ndvi.1994))
saveRDS(ndvi.max,"ndvi.max.rds")

# Load data

ndvi.max <- readRDS("ndvi.max.rds")
catchment.poly <- readRDS("catchment.poly.Rdata")

# Computes summer mean
summer.mean <- raster::stack(ndvi.max[[6]],ndvi.max[[7]],ndvi.max[[8]]) %>% mean()

# Crops raster to Scandinavia
summer.scandinavia <- raster::crop(summer.mean,c(0,35,55,73))

# Re-introduces NA
summer.scan <- reclassify(summer.scandinavia, cbind(-Inf, 0, NA), right=FALSE)
saveRDS(summer.scan, "summer.scan.rds")

# Extract NDVI for each catchment
summer.ndvi <- raster::extract(summer.scan,catchment.poly, fun = mean, df = T, sp = T)
names(summer.ndvi) <- c("ebint","ndvi")
summer.ndvi$ndvi.value <- floor(summer.ndvi$ndvi/10)/1000
summer.ndvi$flag.value <- summer.ndvi$ndvi - floor(summer.ndvi$ndvi/10)*10 + 1 

# Saves NDVI file
saveRDS(summer.ndvi, "summer.ndvi.Rdata")
write.csv(summer.ndvi,"summer.ndvi.csv")
summer.scan <- readRDS("summer.scan.rds")
wr.sf.95 <- readRDS("WR/wr.sf.95.rds")

ndvi.plot <- (floor(summer.scan)/10)/1000

ndvi.plot.spdf <- ndvi.plot %>% as("SpatialPixelsDataFrame") %>% spTransform(projection(wr.sf.95)) %>% as.data.frame()

map.ndvi <- ggplot() + geom_tile(data = ndvi.plot.spdf, aes(x = x, y = y, fill = layer, width = 10000, height = 10000))+
  scale_fill_distiller(type = "seq", palette = 2, direction = 1)+labs(fill = "Summer NDVI 1994")+
  theme_void(base_size = 8)+
  theme(legend.position = "bottom")
ggsave(plot = map.ndvi, filename = "NDVI/map.ndvi.nsf.png", dpi = "retina", units = "cm")
knitr::include_graphics("NDVI/map.ndvi.nsf.png")

Figure 1: Map of summer NVDI in Norway, Sweden and Finland in 1994

1.4 Runoff

The runoff data was downloaded from the CORDEX database (CORDEX 2021). The data is available at https://esg-dn1.nsc.liu.se/projects/esgf-liu/.

The names of the variables in the CORDEX project follow the CF Metadata convention: http://cfconventions.org/. We used here the “mrros” variable, for surface runuff, in kg/m2/s.

According to Euro-CORDEX guidelines, we used the mean runoff over 30 years. We chose the interval 1970 - 2000 to match the temperature and precipitation data provided by WorldClim (p11 on EURO-CORDEX guidelines https://www.euro-cordex.net/imperia/md/content/csc/cordex/guidance_for_euro-cordex_climate_projections_data_use__2021-02_1_.pdf). This time period matches .

runoff.raster <- "CORDEX/mrros_Lmon_CNRM-CM6-1-HR_historical_r1i1p1f2_gr_185001-201412.nc"
runoff.nc <- nc_open(runoff.raster)

runoff.stack <- raster::stack(runoff.raster, bands = c(1441:1812)) # Correspond to January 1970 to December 2000
runoff.mean <- runoff.stack %>% mean() 

catchment.poly <- readRDS("catchment.poly.Rdata")
runoff.fennoscandia <- raster::extract(runoff.mean, catchment.poly, fun = mean, df = T, sp = T, na.rm = T)
names(runoff.fennoscandia) <- c("ebint","runoff")
saveRDS(runoff.fennoscandia,"CORDEX/runoff.fennoscandia.rds")

1.5 Temperature, precipitation and slope

Temperature and precipitation are downloaded from the database Worldclim (WorldClim n.d.). We use the average of the time interval 1970-2000, stored in the “Bioclimatic variable” dataset, available for download at: https://www.worldclim.org/data/worldclim21.html. We used the highest available resolution (30 seconds). Temperature is the mean annual temperature from 1970 to 2000 in °C, or “bio1” in the Bioclimatic dataset. The original data is in C*10 so the data is divided by 10 before use. Precipitation is the mean annual precipitation in mm or bio12 in the Bioclimatic dataset.

bio.fennoscandia <- raster::getData(name = "worldclim", var = "bio", res = 2.5)

annual.temp <- raster::extract(bio.fennoscandia[[1]], catchment.poly, fun = mean, df = T, sp = T)
saveRDS(annual.temp,"annual.temp.Rdata")

annual.prec <- raster::extract(bio.fennoscandia$bio12, catchment.poly, fun = mean, df = T, sp = T)
saveRDS(annual.prec,"annual.prec.Rdata")

The altitude data was accessed through the “getData” function, and comes from the NASA digital elevation model (SRTM 90 m). The slope was then computed using the “terrain” function from the raster package.

alt.nor <- getData("alt",country = "NOR", mask = T)
alt.swe <- getData("alt",country = "SWE", mask = T)
alt.fin <- getData("alt",country = "FIN", mask = T)

alt.list <- list(alt.nor,alt.swe,alt.fin)
alt <- do.call(raster::merge,alt.list)

alt.fennoscandia <- extract(alt,remote.set[,c("longitude","latitude")])
alt.df <- data.frame(ebint = remote.set$ebint, alt = alt.fennoscandia )
saveRDS(alt.df, "alt.df.Rdata")

slope.nor <- terrain(alt.nor, unit = "degrees")
slope.swe <- terrain(alt.swe, unit = "degrees")
slope.fin <- terrain(alt.fin, unit = "degrees")
slope.list <- list(slope.nor,slope.swe,slope.fin)
slope <- do.call(raster::merge,slope.list)

slope.nona <- reclassify(slope,cbind(NA,100), right = NA) #reintroduces NA

slope.fennoscandia <- raster::extract(slope,catchment.poly, sp = T, df = T, fun = mean)
saveRDS(slope.df, "slope.df.Rdata")

1.6 Land cover

The Land Cover data was downloaded from https://land.copernicus.eu/pan-european/corine-land-cover/lcc-2000-2006. The previous version (1995-2000) excludes Norway so the 2000 version was preferred.

The land cover data was downloaded as a raster (.tiff file), which included the legend recording the code for each land use. The raster and legend were assembled in QGIS before being imported as a shapefile in R. The codes (corresponding to the line in the extracted dataframe) of the categories of interest were:

Arable land:

  • 12: non-irrigated arable land
  • 16: fruit trees and berries plantations
  • 18: pastures
  • 19: annual crops associated with permanent crops
  • 20: Complex cultivation patterns

Bogs:

  • 36: peat bogs
  • 35: inland marshes, excluded because all zeros in the studied area.

Forest:

  • 23: Broad-leaved forest
  • 24: Coniferous forest
  • 25: Mixed forest

Bare:

  • 31: Bare rocks
  • 32: Sparsely vegetated areas
  • 33: Burnt areas

The map of these combined categories is shown in Figure 2.

knitr::include_graphics("Corine_Land_Cover/Map_clc.png") 

Figure 2: Map of Land Cover in Norway, Sweden and Finland in 2000
corine <- raster::raster("Corine_Land_Cover/U2006_CLC2000_V2020_20u1.tif")
clc.remote <- raster::extract(corine,catchment.poly.corine)

saveRDS(clc.remote,"clc.remote.Rdata")

The proportion of each land-cover category was computed for each catchment.

clc.remote <- readRDS("clc.remote.Rdata")
clc.tab <- sapply(clc.remote, function(x) tabulate(x,45))
clc.tab.area <- clc.tab*prod(res(corine))
catchment.area <- colSums(clc.tab.area)
clc.tab.prop <- sweep(clc.tab.area,2,catchment.area, FUN = "/")
names(clc.tab.prop) <- catchment.poly.corine$ebint
saveRDS(clc.tab.prop,"clc.tab.prop.Rdata")

Then, a dataframe for the category of interest (bogs, arable land, forest, bare land) was created and saved.

clc.tab.prop <- readRDS("clc.tab.prop.Rdata") %>% as.data.frame()

bogs <- clc.tab.prop[36,] %>% t() %>% as.data.frame() %>% setNames("bogs") %>% tibble::rownames_to_column(var = "ebint")
saveRDS(bogs,"bogs.rds")
arable <- colSums(clc.tab.prop[c(12,16,18,19,20),]) %>% as.data.frame() %>% setNames("arable") %>% tibble::rownames_to_column(var = "ebint")
saveRDS(arable,"arable.rds")
forest <- colSums(clc.tab.prop[c(23,24,25),]) %>% as.data.frame() %>% setNames("forest") %>% tibble::rownames_to_column(var = "ebint")
saveRDS(forest,"forest.rds")
bare <- colSums(clc.tab.prop[c(31,32,33),]) %>% as.data.frame() %>% setNames("bare") %>% tibble::rownames_to_column(var = "ebint")
saveRDS(bare,"bare.rds")

1.7 Nitrogen and sulfur deposition

The nitrogen and sulfur deposition data were extracted from the EMEP database (EMEP 2022). The data from 2000 was used as it is the first model available.

N-deposition (NOx and NH3) is the sum of:

  • Dry deposition of oxidized nitrogen per m2 grid DDEP_OXN_m2Grid, in mg/m2
  • Wet deposition of oxidized nitrogen WDEP_OXN, in mg/m2
  • Dry deposition of oxidized nitrogen per m2 grid DDEP_RDN_m2Grid, in mg/m2
  • Wet deposition of reduced nitrogen WDEP_RDN, in mg/m2
EMEP_file <- "EMEP/EMEP01_rv4.42_year.2000met_2000emis_rep2021.nc"
catchment_poly <- readRDS("catchment.poly.Rdata")

woxn <- raster(EMEP_file, varname = "WDEP_OXN") 
doxn <- raster(EMEP_file, varname = "DDEP_OXN_m2Grid")
wrdn <- raster(EMEP_file, varname = "WDEP_RDN")
drdn <- raster(EMEP_file, varname = "DDEP_RDN_m2Grid")

woxn_df <- extract(woxn,catchment.poly, sp = T, df = T, na.rm = T, fun = mean) 
names(woxn_df) <- c("ebint","woxn")
doxn_df <- extract(doxn,catchment.poly, sp = T, df = T, na.rm = T, fun = mean)
names(doxn_df) <- c("ebint","doxn")
wrdn_df <- extract(wrdn,catchment.poly, sp = T, df = T, na.rm = T, fun = mean)
names(wrdn_df) <- c("ebint","wrdn")
drdn_df <- extract(drdn,catchment.poly, sp = T, df = T, na.rm = T, fun = mean)
names(drdn_df) <- c("ebint","drdn")

ndep.df <- merge(woxn_df,doxn_df, by = "ebint") %>% merge(wrdn_df, by = "ebint") %>% merge(drdn_df, by = "ebint")
ndep.df$tndep <- rowSums(ndep.df@data[,c(2:5)])

saveRDS(ndep.df,"ndep.df.Rdata")

The S-deposition is composed of the sum of:

  • Dry deposition of oxidized sulphur per m2 grid DDEP_SOX_m2Grid, in mg/m2
  • Wet deposition of oxidized sulphur WDEP_SOX, in mg/m2
library(ncdf4)
EMEP_file <- "EMEP/EMEP01_rv4.42_year.2000met_2000emis_rep2021.nc"
catchment_poly <- readRDS("catchment.poly.Rdata")

wsox <- raster(EMEP_file, varname = "WDEP_SOX") 
dsox <- raster(EMEP_file, varname = "DDEP_SOX_m2Grid")

wsox_df <- extract(wsox,catchment_poly, sp = T, df = T, na.rm = T, fun = mean) 
names(wsox_df) <- c("ebint","wsox")

dsox_df <- extract(dsox,catchment_poly, sp = T, df = T, na.rm = T, fun = mean)
names(dsox_df) <- c("ebint","dsox")

sdep.df <- merge(wsox_df,dsox_df, by = "ebint")
sdep.df$tsdep <- rowSums(sdep.df@data[,c(2:3)])

saveRDS(sdep.df,"sdep.df.rds")

1.8 Gathering and cleaning the data

The data was gathered and merged into a single dataframe, based on the catchment identifyer (“ebint”).

country <- readRDS("country.Rdata")
toc.df <- readRDS("toc.df.Rdata")
summer.ndvi <- readRDS("summer.ndvi.Rdata")
runoff.fennoscandia <- readRDS("CORDEX/runoff.fennoscandia.rds")

annual.temp <- readRDS("annual.temp.Rdata")
names(annual.temp) <- c("ebint","temp")

annual.prec <- readRDS("annual.prec.Rdata")
names(annual.prec) <- c("ebint","prec")

slope.fennoscandia <- readRDS("slope.fennoscandia.Rdata")
names(slope.fennoscandia) <- c("ebint","slope")

alt.df <- readRDS("alt.df.Rdata")

bogs <- readRDS("bogs.rds")
arable <- readRDS("arable.rds")
forest <- readRDS("forest.rds")
bare <- readRDS("bare.rds")

ndep <- readRDS("ndep.df.Rdata")
sdep <- readRDS("sdep.df.rds")

fennoscandia_raw <- merge(country,toc.df, by = "ebint") %>% 
  merge(summer.ndvi, by = "ebint") %>% 
  merge(runoff.fennoscandia,by = "ebint") %>% 
  merge(annual.temp, by ="ebint") %>% 
  merge(annual.prec, by = "ebint") %>%
  merge(slope.fennoscandia, by = "ebint") %>%
  merge(alt.df, by = "ebint") %>%
  merge(bogs, by = "ebint") %>%
  merge(arable, by = "ebint")%>%
  merge(forest,by = "ebint") %>%
  merge(bare, by = "ebint") %>%
  merge(ndep, by = "ebint") %>% 
  merge(sdep, by = "ebint")

fennoscandia_raw$uncertain <- ifelse(fennoscandia_raw$dist_closest_ebint/fennoscandia_raw$dist_2nd_closest_ebint > 0.5, FALSE, TRUE)
fennoscandia_raw$uncertain[which(is.na(fennoscandia_raw$uncertain)==TRUE)] <- FALSE

fennoscandia_raw <- fennoscandia_raw %>% filter(toc_mg_l >= 0)
fennoscandia_raw <- fennoscandia_raw %>% filter(runoff > 0)
fennoscandia_raw <- fennoscandia_raw %>% filter(ndvi.value > 0)
fennoscandia_raw <- fennoscandia_raw %>% filter(is.na(slope) == F)
fennoscandia_raw <- fennoscandia_raw %>% filter(alt != 0)
fennoscandia_raw <- fennoscandia_raw %>% filter(uncertain == FALSE)
fennoscandia_raw$uncertain <- NULL

saveRDS(fennoscandia_raw,"fennoscandia_raw.rds")

1.8.1 Distribution of variables

The distribution of the variables were checked with histograms.

fennoscandia_raw <- readRDS("fennoscandia_raw.rds")

fennoscandia_hist <- dplyr::select(fennoscandia_raw,!c("ebint","nation","dist_closest_ebint","dist_2nd_closest_ebint","ndvi","flag.value")) 

titles <- c("TOC","Longitude","Latitude","NDVI","Runoff (mm)","Temperature (°C)","Precipitation (mm)","Slope","Altitude (m)","Bogs","Arable","Forest","Bare land", "Wet Oxidized Nitrogen Deposition, mg/m2","Dry Oxidized Nitrogen Deposition (mg/m2)", "Wet reduced nitrogen deposition (mg/m2)","Dry reduced Nitrogen deposition (mg/m2)", "Total Nitrogen Deposition (mg/m2)", "Wet sulphur deposition (mg/L)", "Dry sulphur deposition (mg/m2)","Total sulphur deposition (mg/m2)")
  
h_list <- c()

for (i in 1:length(names(fennoscandia_hist))){
  if(IQR(fennoscandia_hist[,i], na.rm = T) > 0.1){
    hi <- ggplot(data = fennoscandia_hist)+geom_histogram(aes_string(x = names(fennoscandia_hist)[i]), stat = "bin",na.rm = T,binwidth = function(x) 2*IQR(x, na.rm = T)/(length(x)^(1/3)))+
          labs(y="",title = titles[i])+theme_light(base_size = 20)
  }else{
    hi <- ggplot(data = fennoscandia_hist)+geom_histogram(aes_string(x = names(fennoscandia_hist)[i]), stat = "bin",na.rm = T)+
          labs(y="",title = titles[i])+theme_light(base_size = 20)
  }
  h_list[[i]] <- hi
}

cowplot::plot_grid(plotlist = h_list,ncol=3)

1.8.2 Transformation of variables

TOC, Runoff, Precipitation and Slope are skewed to the right so they are log-transformed. N-dep, S-dep, Bog and Arable included many zero, so they were not transformed even if they were skewed.

Moreover, all the variables were scaled and centered.

fennoscandia <- readRDS("fennoscandia_raw.rds")

v <- c("toc_mg_l","runoff","temp","prec","slope","alt","bogs","arable","forest","bare","tndep","tsdep")
names(fennoscandia)[which(names(fennoscandia) %in% v)] <- c("TOC", "Runoff","Temp","Precip","Slope","Alt","Bog","Arable","Forest","Bare","TNdep","TSdep")

fennoscandia$logTOC <- log(fennoscandia$TOC) 
fennoscandia$s.logTOC <- scale(fennoscandia$logTOC) 
fennoscandia$Runoff <- fennoscandia$Runoff*365*24*60*60 # from kg/m2/s to kg/m2/year
fennoscandia$logRunoff <- log(fennoscandia$Runoff)  
fennoscandia$s.logRunoff <- scale(fennoscandia$logRunoff)
fennoscandia$NDVI <- fennoscandia$ndvi.value
fennoscandia$s.NDVI <- scale(fennoscandia$NDVI)
fennoscandia$s.Bog <- scale(fennoscandia$Bog)
fennoscandia$s.Arable <- scale(fennoscandia$Arable)
fennoscandia$s.Forest <- scale(fennoscandia$Forest)
fennoscandia$Temp <- fennoscandia$Temp / 10 # divided by 10 because the temperature is stored in C*10
fennoscandia$s.Temp <-  scale(fennoscandia$Temp) 
fennoscandia$logPrecip <- log(fennoscandia$Precip+1e-5)
fennoscandia$s.logPrecip <- scale(fennoscandia$logPrec)
fennoscandia$s.Slope <- scale(fennoscandia$Slope)
fennoscandia$s.TNdep <- scale(fennoscandia$TNdep)
fennoscandia$s.TSdep <- scale(fennoscandia$TSdep)
fennoscandia$rdn <- fennoscandia$drdn+fennoscandia$wrdn
fennoscandia$oxn <- fennoscandia$doxn+fennoscandia$woxn
fennoscandia$s.rdn <- scale(fennoscandia$rdn)
fennoscandia$s.oxn <- scale(fennoscandia$oxn)

# removes old ndvi column
fennoscandia$ndvi <- NULL

saveRDS(fennoscandia, "fennoscandia.rds")

# create fennoscandia SpatialPointsDataFrame
fennoscandia.spdf <- SpatialPointsDataFrame(fennoscandia[,c("longitude","latitude")], fennoscandia)
saveRDS(fennoscandia.spdf, "fennoscandia.spdf.rds")

norge <- filter(fennoscandia,nation == "Norway")
saveRDS(norge, "norge.rds")

The coordinate system of the dataset was converted to UTM33.

catchment.poly <- readRDS("catchment.poly.Rdata")
wr.sf.95 <- readRDS("WR/wr.sf.95.rds")
fennoscandia.cat <- merge(fennoscandia,catchment.poly, by = "ebint") %>% st_as_sf()
fennoscandia.33 <- st_transform(fennoscandia.cat,st_crs(wr.sf.95))
fennoscandia.33$area <- st_area(fennoscandia.33$geom)
saveRDS(fennoscandia.33, "fennoscandia.33.rds")

1.9 Mapping of predictor and response variable

The TOC concentration and the predictor variables were plotted by catchment.

fennoscandia.33 <- readRDS("fennoscandia.33.rds")
m <- c("TOC","NDVI","Runoff","Precip","Temp","Bog","Arable","Forest","TNdep","TSdep")
u <- c("TOC concentration,\nmgC/L", "NDVI on scale 0 to 1", "Mean Runoff, mm/y","Mean annual\nprecipitation, mm/y","Mean annual\n temperature, °C","Proportion of bogs","Proportion of\narable land","Proportion of forest","Total nitrogen\ndeposition, mg/m2", "Total sulphur\ndeposition, mg/m2")
dir <- c(T,F,F,F,T,T,T,F,F,F)
midpoint <- c(0,0.6,median(fennoscandia.33$Runoff),median(fennoscandia.33$Precip),median(fennoscandia.33$Temp),0.3,0.3,0.5,median(fennoscandia.33$TNdep), median(fennoscandia.33$TSdep))
marg <- 0
nor <- st_read("Country_shapefile/norway.shp") %>% st_transform(crs(fennoscandia.33))
swe <- st_read("Country_shapefile/sweden.shp") %>% st_transform(crs(fennoscandia.33))
fin <- st_read("Country_shapefile/finland.shp") %>% st_transform(crs(fennoscandia.33))


for(i in m[6:7]){
  
 mysf <- fennoscandia.33

 map <- ggplot()+geom_sf(data = nor, fill = "white", size = 0.2)+
   geom_sf(data = swe, fill = "white", size = 0.2)+
   geom_sf(data = fin, fill = "white", size = 0.2)+
   geom_sf(data = mysf, aes(fill = .data[[i]], col = .data[[i]]))+
   scale_fill_continuous_divergingx(palette = "RdYlBu", aesthetics = c("fill","col"),name= paste(u[grep(i,m)],"\n(",i,")",sep=""), rev = dir[grep(i,m)], mid = midpoint[grep(i,m)]) +
   theme_minimal(base_size = 10)+
    theme(legend.position = "bottom", plot.margin = unit(c(0,marg,0,marg), units = "cm"))
    
  filename_i <- paste("map",i,"png",sep = ".")
  ggsave(plot = map, filename = filename_i, dpi = "retina", width = 10, height = 12, units = "cm")
   
 }


label_tb <- read.table(text = 
  " Label Long Lat
    Norway 7 65
    Sweden 10 55
    Finland 25 59", header = T) 
label_sf <- label_tb %>% st_as_sf(coords = c("Long", "Lat"), crs = "+proj=longlat +datum=WGS84") %>% st_transform(crs(fennoscandia.33))

for(i in m[1]){
  
 mysf <- fennoscandia.33
 midcol <- quantile(mysf[[i]], 0.99)

 map <- ggplot()+
  geom_sf_label(data = label_sf, aes(label = Label), label.size = 0.2, label.padding = unit(0.1,"lines"), size = 3)+
  geom_sf(data = nor, fill = "white", size = 0.2)+
  geom_sf(data = swe, fill = "white", size = 0.2)+
  geom_sf(data = fin, fill = "white", size = 0.2)+
  geom_sf(data = filter(mysf, TOC < 20), aes(fill = .data[[i]], col = .data[[i]]), size = 0)+
  scale_fill_continuous_divergingx(palette = "RdYlBu", aesthetics = c("fill","col"),name= paste(u[grep(i,m)],"\n(",i,")",sep=""), mid = median(mysf[[i]]), rev = dir[grep(i,m)]) +
   geom_sf(data = filter(mysf, TOC > 20),col = "sienna4", fill = "sienna4")+
   labs(x="",y="")+
   theme_minimal(base_size = 10)+
    theme(legend.position = "bottom", plot.margin = unit(c(0,marg,0,marg), units = "cm"))
    
  filename_i <- paste("map",i,"png",sep = ".")
  ggsave(plot = map, filename = filename_i, dpi = "retina", width = 10, height = 12, units = "cm")
  
}

   scale_color_manual(labels = "TOC > 20 mg/L", guide = guide_legend(override.aes = list(colour = "sienna4")))+

white.fen <- ggplot()+theme_void()+theme(plot.margin = unit(c(0,marg,0,marg), units = "cm"))
ggsave(plot = white.fen, filename = "white.fen.png", dpi = "retina", width = 10, height = 12, units = "cm")
listgrob <- list("map.toc.png","map.ndvi.png","map.Runoff.png","map.Precip.png","map.temp.png","map.Bog.png","map.arable.png","map.forest.png","map.tndep.png", "map.tsdep.png") %>% lapply(readPNG) %>% lapply(rasterGrob)
#grid.arrange(grobs = listgrob, ncol = 3)
all.map <- plot_grid(plotlist = listgrob, ncol= 3, nrow = 3, align = "v", greedy = T, labels = c("a)","b)","c)","d)","e)","f)","g)","h)","i)","j"), label_size = 20)
print(all.map)

Figure 3: Maps of DOM (TOC, a) and its predictor variables : Biomass (NDVI, b), mean runoff intensity (Runoff, c), temperature (Temp, d), precipitation intensity (Precip, e), proportion of peat (Bog, f), proportion of forest (Forest, c), proportion of arable land (Arable, d), total nitrogen deposition (TNdep, i) and total sulfur deposition (TSdep, j) in the studied catchments
fennoscandia.33 <- readRDS("fennoscandia.33.rds") %>% st_drop_geometry()
waterchem <- readxl::read_xlsx("waterchem_1995.xlsx") %>% dplyr::select(c("ebint", "lake_area_km2"))
 
m <- c("area","TOC","NDVI","Runoff","Precip","Temp","Bog","Arable","Forest","TNdep","TSdep")
fenno <- fennoscandia.33 %>% dplyr::select(c("ebint",m)) %>% merge(waterchem, by = "ebint")
fenno$area <- fenno$area

fen.min <- lapply(fenno, min) %>% as.data.frame() 
fen.max <- lapply(fenno, max) %>% as.data.frame() 
fen.median <- lapply(fenno, median) %>% as.data.frame()
fen.mean <- lapply(fenno, mean) %>% as.data.frame()
fen.sd <- lapply(fenno, sd) %>% as.data.frame()

fen.sum <- rbind(fen.min, fen.max, fen.median, fen.mean, fen.sd)
rownames(fen.sum) <- c("Min","Max","Median","Mean","Sd")

m <- c("area", "lake_area_km2","TOC","NDVI","Runoff","Precip","Temp","Bog","Arable","Forest","TNdep","TSdep")
u <- c("Catchment area, m2","Lake area km2","TOC concentration,\nmgC/L", "NDVI on scale 0 to 1", "Mean Runoff, mm/y","Mean annual\nprecipitation, mm/y","Mean annual\n temperature, °C","Proportion of bogs","Proportion of\narable land","Proportion of forest","Total nitrogen\ndeposition, mg/m2", "Total sulphur\ndeposition, mg/m2")

fen.sum %>% dplyr::select(m) %>% kable(col.names = u, digits = 2) %>% 
  kable_styling(bootstrap_options = "bordered")
Catchment area, m2 Lake area km2 TOC concentration, mgC/L NDVI on scale 0 to 1 Mean Runoff, mm/y Mean annual precipitation, mm/y Mean annual temperature, °C Proportion of bogs Proportion of arable land Proportion of forest Total nitrogen deposition, mg/m2 Total sulphur deposition, mg/m2
Min 500 [m^2] 0.00 0.20 0.06 56.06 406.75 -5.60 0.00 0.00 0.00 63.15 54.53
Max 5106701500 [m^2] 141.04 43.20 0.93 971.21 2760.50 7.75 1.00 1.00 1.00 2575.94 1220.37
Median 4589900 [m^2] 0.15 6.70 0.78 214.12 642.00 2.60 0.00 0.00 0.69 437.30 266.21
Mean 106742260 [m^2] 1.81 7.51 0.76 263.54 714.54 2.54 0.06 0.02 0.60 564.11 301.85
Sd 416462612 [m^2] 6.75 4.95 0.11 155.85 268.75 2.93 0.14 0.06 0.32 425.71 181.38

1.10 Spatial autocorrelation & VIF

The spatial autocorrelation of the response and predictor variables were computed using Moran’s I (see Supplementary 4).

The neighbor matrix is computed first.

fennoscandia.spdf <- readRDS("fennoscandia.spdf.rds")
k.neigh.w <- knearneigh(fennoscandia.spdf, k = 100) %>% knn2nb() %>% nb2listw()
saveRDS(k.neigh.w,"k.neigh.w.rds")

The spatial autocorrelation is computed for the response and predictor variables.

k.neigh.w <- readRDS("k.neigh.w.rds")
fennoscandia <- readRDS("fennoscandia.rds")

moran.list <- c() 

m <- c("logTOC","NDVI","logRunoff","logPrecip","Temp","Bog","Arable","Forest","TNdep","TSdep")

for (i in m){
  moran.list[[i]] <- moran.test(fennoscandia[,i],k.neigh.w, na.action = na.omit, alternative = "two.sided")
}

moran.df <- data.frame(m) %>% setNames("parameter")
moran.df$moran.I <- NA
moran.df$p.value <- NA

for (i in 1:length(moran.list)){
  moran.df$moran.I[i] <- moran.list[[i]]$estimate[1]
  moran.df$p.value[i] <- moran.list[[i]]$p.value
}

saveRDS(moran.df,"moran.df.rds")
Table 1: Moran’s I of the variables
moran.df <- readRDS("moran.df.rds")
moran.df %>% arrange(desc(moran.I)) %>% kable(col.names = c("Predictor","Moran's I","P-value")) %>% kable_styling(bootstrap_options = "bordered", position = "center")
Predictor Moran’s I P-value
Temp 0.8962794 0
logPrecip 0.8881238 0
TNdep 0.8772191 0
TSdep 0.8574352 0
logRunoff 0.7816173 0
logTOC 0.5805699 0
NDVI 0.5399148 0
Forest 0.4880511 0
Bog 0.1627365 0
Arable 0.1185581 0

2 Comparison LM and SELM, 5 selected predictors (NDVI, Runoff, Bog, Arable, TNdep)

2.1 Linear model on NSF, 5 selected predictors

A linear model was fitted with the 5 selected predictors, both for scaled and unscaled variables.

k.neigh.w <- readRDS("k.neigh.w.rds")
fennoscandia <- readRDS("fennoscandia.rds")

fm.s <- s.logTOC~s.NDVI+s.logRunoff+s.Bog+s.Arable+s.TNdep
s.lm.fennoscandia.5 <- lm(fm.s, data = fennoscandia, na.action = na.exclude)
saveRDS(s.lm.fennoscandia.5,"s.lm.fennoscandia.5.rds")

fm <- logTOC~NDVI+logRunoff+Bog+Arable+TNdep
lm.fennoscandia.5 <- lm(fm, data = fennoscandia, na.action = na.exclude)
saveRDS(lm.fennoscandia.5,"lm.fennoscandia.5.rds")

s.lm.r2.5 <- summary(s.lm.fennoscandia.5)$adj.r.squared
s.moran.I.res.lm.5 <- lm.morantest(s.lm.fennoscandia.5, k.neigh.w, alternative = "two.sided")

lm.r2.5 <- summary(lm.fennoscandia.5)$adj.r.squared
moran.I.res.lm.5 <- lm.morantest(lm.fennoscandia.5, k.neigh.w, alternative = "two.sided")

moran.limit.fennoscandia.5 <- 1/(dim(fennoscandia)[1]-1)

s.lm.coef.5 <- summary(s.lm.fennoscandia.5)$coefficients %>% as.data.frame()
s.lm.coef.5 <- s.lm.coef.5[-1,] %>% rownames_to_column(var = "Parameter")
saveRDS(s.lm.coef.5, "s.lm.coef.5.rds")

lm.coef.5 <- summary(lm.fennoscandia.5)$coefficients %>% as.data.frame()
lm.coef.5 <- lm.coef.5[-1,] %>% rownames_to_column(var = "Parameter")

lm.effectsize.5 <- compute.effect.size(lm.coef.5)
saveRDS(lm.effectsize.5, "lm.effectsize.5.rds")

We checked the distribution of the coefficients, a posteriori, using SIM.

sm.fennoscandia.5 <- arm::sim(lm.fennoscandia.5,n.sims = 100)

sm.coef.5 <- sm.fennoscandia.5@coef[,-1]
sm.effectsize.5 <- data.frame(parameter = colnames(sm.coef.5), effect.size = NA, std.error = NA)

n <- dim(sm.coef.5)[2]
m <- colnames(sm.coef.5)

for(i in 1:n){
  if(grepl("log",m[i])==T){
    sm.effectsize.5[i,2] <- mean(1.1^(sm.coef.5[,i]))
    sm.effectsize.5[i,3] <- sd(1.1^(sm.coef.5[,i])*log(1.1)*sm.coef.5[,i])
  }else if(m[i] %in% c("NDVI","Bog","Arable","Forest")){
    sm.effectsize.5[i,2] <- mean(exp(sm.coef.5[,i]*0.1))
    sm.effectsize.5[i,3] <- sd(0.1*exp(sm.coef.5[,i]*0.1)*sm.coef.5[,i])
  }else if(m[i] %in% c("TNdep","TSdep")){
    sm.effectsize.5[i,2] <- mean(exp(sm.coef.5[,i]*250))
    sm.effectsize.5[i,3] <- sd(250*exp(sm.coef.5[,i]*250)*sm.coef.5[,i])
  }
}

The results are summarized below.

summary.table.lm.5 <- cbind(dplyr::select(lm.coef.5,Parameter),s.lm.coef.5[,2:3],lm.coef.5[,2:3],lm.effectsize.5[,2:4],sm.effectsize.5[,2:3]) %>%
                rbind(c("AIC",AIC(s.lm.fennoscandia.5),"",AIC(lm.fennoscandia.5),"","","","","","")) %>%
                rbind(c("R2",s.lm.r2.5,"",lm.r2.5,"","","","","","")) %>%
                rbind(c("Moran'I res",s.moran.I.res.lm.5$estimate[[1]],"",moran.I.res.lm.5$estimate[[1]],"","","","","",""))
summary.table.lm.5[,-1] <- apply(summary.table.lm.5[,-1],2,as.numeric)

saveRDS(summary.table.lm.5, "summary.table.lm.5.rds")
Table 2: Results for the linear model with 5 selected predictors on Norway, Sweden and Finland
summary.table.lm.5 <- readRDS("summary.table.lm.5.rds")

n <- 5

knitr::kable(summary.table.lm.5,digits = 3, caption = "LM with 5 selected predictors") %>%
  add_header_above(c("Model"=1,"Scaled LM" = 2, "LM" = 2, "Effect size" = 3,"SIM" = 2),italic = T) %>%
  column_spec(column = 8, bold = T) %>% 
  kable_styling(bootstrap_options = "bordered") %>%
  group_rows(group_label = "Model Evaluation", start_row = n+1,end_row=n+3)
LM with 5 selected predictors
Model
Scaled LM
LM
Effect size
SIM
Parameter Estimate Std. Error Estimate Std. Error Estimate Std. Error Percent effect.size std.error
NDVI 0.496 0.011 3.939 0.089 1.040 0.004 4.018 1.480 0.017
logRunoff -0.382 0.012 -0.686 0.021 0.993 0.000 -0.681 0.937 0.002
Bog 0.126 0.010 0.806 0.062 1.008 0.001 0.810 1.085 0.007
Arable 0.010 0.010 0.131 0.129 1.001 0.000 0.131 1.013 0.014
TNdep 0.078 0.011 0.000 0.000 1.004 0.001 0.419 1.044 0.006
Model Evaluation
AIC 7028.458 6319.252
R2 0.650 0.650
Moran’I res 0.140 0.140

The residuals are plotted below.

fennoscandia.33 <- readRDS("fennoscandia.33.rds")
s.lm.fennoscandia.5 <- readRDS("s.lm.fennoscandia.5.rds")
lm.fennoscandia.5 <- readRDS("lm.fennoscandia.5.rds")

f_plotspatial(data = fennoscandia,var = s.lm.fennoscandia.5$residuals, plottitle = "a) Residuals for scaled LM")

f_plotspatial(data = fennoscandia,var = lm.fennoscandia.5$residuals, plottitle = "b) Residuals for unscaled LM")


qplot(y = s.lm.fennoscandia.5$residuals, x = as.numeric(fennoscandia.33$area))+
  labs(y ="Scaled LM residuals, 5 predictors", x = "Catchment area, m2")+
  theme_minimal()

qplot(y = lm.fennoscandia.5$residuals, x = as.numeric(fennoscandia.33$area))+
  labs(y ="SELM residuals, 5 predictors", x = "Catchment area, m2")+
  theme_minimal()

Figure 4: Residuals for linear model, a) with scaled variables, b) with unscaled variables

2.2 Spatial error linear model on NSF, 5 selected predictors (NDVI, Runoff, Bog, Arable, TNdep)

A Spatial Error Linear Model was fitted on for the catchments in Norway, Sweden and Finland, both on scaled and unscaled variables.

k.neigh.w <- readRDS("k.neigh.w.rds")
norge.kmat <- readRDS("norge.kmat.Rdata")

#lagrange <- lm.LMtests(lm.fennoscandia,k.neigh.w,test = c("LMerr","LMlag","RLMerr","RLMlag"))

fm.s <- s.logTOC~s.NDVI+s.logRunoff+s.Bog+s.Arable+s.TNdep
s.sem.fennoscandia.5 <- errorsarlm(fm.s,fennoscandia,k.neigh.w)
saveRDS(s.sem.fennoscandia.5,"s.sem.fennoscandia.5.rds")

fm <- logTOC~NDVI+logRunoff+Bog+Arable+TNdep
sem.fennoscandia.5 <- errorsarlm(fm,fennoscandia,k.neigh.w)
saveRDS(sem.fennoscandia.5,"sem.fennoscandia.5.rds")

The results of the model are displayed below.

s.sem.fennoscandia.5 <- readRDS("s.sem.fennoscandia.5.rds")
sem.fennoscandia.5 <- readRDS("sem.fennoscandia.5.rds")
k.neigh.w <- readRDS("k.neigh.w.rds")

s.sem.coef.5 <- s.sem.fennoscandia.5$coefficients[-1] %>% as.data.frame() %>% 
              setNames("Estimate")
s.sem.coef.5 <- s.sem.coef.5 %>% rownames_to_column(var = "Parameter")
s.sem.coef.5$std.error <- s.sem.fennoscandia.5$rest.se[-1]
saveRDS(s.sem.coef.5, "s.sem.coef.5.rds")

sem.coef.5 <- sem.fennoscandia.5$coefficients[-1] %>% as.data.frame() %>% 
              setNames("Estimate")
sem.coef.5 <- sem.coef.5 %>% rownames_to_column(var = "Parameter")
sem.coef.5$std.error <- sem.fennoscandia.5$rest.se[-1]


sem.effectsize.5 <- compute.effect.size(sem.coef.5)
saveRDS(sem.effectsize.5, "sem.effectsize.5.rds")

n <- length(sem.coef.5$Parameter)

s.sem.AIC.5 <- 2*(n-1)-2*s.sem.fennoscandia.5$LL
sem.AIC.5 <- 2*(n-1)-2*sem.fennoscandia.5$LL

s.moran.I.res.sem.5 <- moran.test(s.sem.fennoscandia.5$residuals, k.neigh.w, alternative = "two.sided")

moran.I.res.sem.5 <- moran.test(sem.fennoscandia.5$residuals, k.neigh.w, alternative = "two.sided")

summary.table.sem.5 <- cbind(dplyr::select(sem.coef.5,Parameter),s.sem.coef.5[,2:3],sem.coef.5[,2:3],sem.effectsize.5[,2:4]) %>%
                rbind(c("AIC",s.sem.AIC.5,"",sem.AIC.5,"","","","")) %>%
                rbind(c("Moran'I res", s.moran.I.res.sem.5$estimate[[1]],"", moran.I.res.sem.5$estimate[[1]], "","","",""))
summary.table.sem.5[,-1] <- apply(summary.table.sem.5[,-1],2,as.numeric)

saveRDS(summary.table.sem.5, "summary.table.sem.5.rds")
Table 2: Results for the linear model with 5 selected predictors on Norway, Sweden and Finland
summary.table.sem.5 <- readRDS("summary.table.sem.5.rds")

n <- 5

knitr::kable(summary.table.sem.5,digits = 3, caption = "SELM with 5 selected predictors") %>%
  add_header_above(c("Model"=1,"Scaled SELM" = 2, "SELM" = 2, "Effect size" = 3),italic = T) %>%
  kable_styling(bootstrap_options = "bordered") %>%
  column_spec(column = 8, bold = T) %>% 
  group_rows(group_label = "Model Evaluation", start_row = n+1,end_row=n+2)
SELM with 5 selected predictors
Model
Scaled SELM
SELM
Effect size
Parameter Estimate std.error Estimate std.error Estimate std.error Percent
NDVI 0.404 0.014 3.213 0.108 1.033 0.003 3.265
logRunoff -0.149 0.023 -0.268 0.041 0.997 0.000 -0.266
Bog 0.140 0.009 0.896 0.060 1.009 0.001 0.900
Arable 0.008 0.009 0.102 0.121 1.001 0.000 0.102
TNdep 0.096 0.033 0.000 0.000 1.005 0.002 0.521
Model Evaluation
AIC 6241.363 5532.157
Moran’I res 0.008 0.008

The residuals for scaled and unscaled models are presented in Figure 5

fennoscandia.33 <- readRDS("fennoscandia.33.rds")

s.sem.fennoscandia.5 <- readRDS("s.sem.fennoscandia.5.rds")
sem.fennoscandia.5 <- readRDS("sem.fennoscandia.5.rds")

f_plotspatial(data = fennoscandia,var = s.sem.fennoscandia.5$residuals, plottitle = "a) Residuals for scaled SELM")

f_plotspatial(data = fennoscandia,var = sem.fennoscandia.5$residuals, plottitle = "b) Residuals for unscaled SELM")


qplot(y = s.sem.fennoscandia.5$residuals, x = as.numeric(fennoscandia.33$area))+
  labs(y ="Scaled SELM residuals, 5 predictors", x = "Catchment area, m2")+
  theme_minimal()

qplot(y = sem.fennoscandia.5$residuals, x = as.numeric(fennoscandia.33$area))+
  labs(y ="SELM residuals, 5 predictors", x = "Catchment area, m2")+
  theme_minimal()

Figure 5: Map of residuals for a) scaled and b) unscaled SELM

2.3 Comparative plots 5

s.lm.coef.5 <- readRDS("s.lm.coef.5.rds")
s.sem.coef.5 <- readRDS("s.sem.coef.5.rds")

n <- c(names(s.sem.coef.5),"Model")
m <- cbind.data.frame(s.lm.coef.5[,1:3],Model = "LM") %>% setNames(n)
o <- cbind.data.frame(s.sem.coef.5,Model = "SELM")
  
scaled.df.5 <- rbind(m,o)
scaled.df.5$Parameter <- factor(scaled.df.5$Parameter, levels = c("s.TNdep","s.Arable","s.Bog","s.logRunoff","s.NDVI"))

scaled.plot.5 <- ggplot(scaled.df.5, aes(group = Model))+geom_col(aes(x=Estimate,y=Parameter, fill = Model), position = "dodge", width = 0.5)+
  scale_fill_manual(values = c("#d73027","#4575b4"))+
  geom_errorbar(aes(y=Parameter,xmin = Estimate-std.error,xmax=Estimate+std.error), position = "dodge", width = 0.5)+
  labs(x = "Scaled Estimate", y = "Model with 5 predictors")+
  theme_bw(base_size = 8)+theme(legend.position = "bottom")
sem.effectsize.5 <- readRDS("sem.effectsize.5.rds")
lm.effectsize.5 <- readRDS("lm.effectsize.5.rds")

n <- c(names(sem.effectsize.5),"Model")
m <- cbind.data.frame(lm.effectsize.5,Model = "LM") %>% setNames(n)
o <- cbind.data.frame(sem.effectsize.5,Model = "SELM")
  
h.df.5 <- rbind(m,o)
h.df.5$Parameter <- factor(h.df.5$Parameter, levels = c("TNdep","Arable","Bog","logRunoff","NDVI"))

effectsize.plot.5 <- ggplot(h.df.5, aes(group = Model))+
  geom_col(aes(x=Percent,y=Parameter, fill = Model), position = "dodge", width = 0.5)+
  scale_fill_manual(values = c("#d73027","#4575b4"))+
  geom_errorbar(aes(y=Parameter,xmin = Percent-(100*std.error),xmax=Percent+(100*std.error)), position = "dodge", width = 0.5)+
  labs(x = "Effect size in %", y = "")+
  theme_bw(base_size = 8)+theme(legend.position = "bottom")
compare.5 <- cowplot::plot_grid(plotlist = list(scaled.plot.5,effectsize.plot.5),ncol=2)
save_plot(plot = compare.5, filename = "compare.lm.selm.5.png")
print(compare.5)

Figure 6: Comparison of scaled estimates for LM and SELM (left) and of the effect sizes for each predictor (right) with NDVI, logRunoff, Bog, Arable and TNdep as predictors

3 SELM with other predictors

3.1 Comparison of linear model and spatial error model, 9 predictors

The linear model and the spatial error linear model were fitted with all 9 predictors (including predictors correlated to each other or having a high Variance Inflation Factor).

3.1.1 Linear model on NSF, 9 predictors

k.neigh.w <- readRDS("k.neigh.w.rds")
fennoscandia <- readRDS("fennoscandia.rds")

fm.s <- s.logTOC~s.NDVI+s.logRunoff+s.Bog+s.Arable+s.Forest+s.TNdep+s.TSdep+s.Temp+s.logPrecip
s.lm.fennoscandia.9 <- lm(fm.s, data = fennoscandia, na.action = na.exclude)
saveRDS(s.lm.fennoscandia.9,"s.lm.fennoscandia.9.rds")

fm <- logTOC~NDVI+logRunoff+Bog+Arable+Forest+TNdep+TSdep+Temp+logPrecip
lm.fennoscandia.9 <- lm(fm, data = fennoscandia, na.action = na.exclude)
saveRDS(lm.fennoscandia.9,"lm.fennoscandia.9.rds")

s.lm.r2 <- summary(s.lm.fennoscandia.9)$adj.r.squared
s.moran.I.res.lm <- lm.morantest(s.lm.fennoscandia.9, k.neigh.w, alternative = "two.sided")

lm.r2 <- summary(lm.fennoscandia.9)$adj.r.squared
moran.I.res.lm <- lm.morantest(lm.fennoscandia.9, k.neigh.w, alternative = "two.sided")

moran.limit.fennoscandia <- 1/(dim(fennoscandia)[1]-1)

s.lm.coef.9 <- summary(s.lm.fennoscandia.9)$coefficients %>% as.data.frame()
s.lm.coef.9 <- s.lm.coef.9[-1,] %>% rownames_to_column(var = "Parameter")
saveRDS(s.lm.coef.9, "s.lm.coef.9.rds")

lm.coef.9 <- summary(lm.fennoscandia.9)$coefficients %>% as.data.frame()
lm.coef.9 <- lm.coef.9[-1,] %>% rownames_to_column(var = "Parameter")

lm.effectsize.9 <- compute.effect.size(lm.coef.9)
saveRDS(lm.effectsize.9, "lm.effectsize.9.rds")
summary.table.lm.9 <- cbind(dplyr::select(lm.coef.9,Parameter),s.lm.coef.9[,2:3],lm.coef.9[,2:3],lm.effectsize.9[,2:4]) %>%
                rbind(c("AIC",AIC(s.lm.fennoscandia.9),"",AIC(lm.fennoscandia.9),"","","")) %>%
                rbind(c("R2",s.lm.r2,"",lm.r2,"","","")) %>%
                rbind(c("Moran'I res",s.moran.I.res.lm$estimate[[1]],"",moran.I.res.lm$estimate[[1]],"","",""))
summary.table.lm.9[,-1] <- apply(summary.table.lm.9[,-1],2,as.numeric)

saveRDS(summary.table.lm.9, "summary.table.lm.9.rds")
Table 3: Summary table for linear model with 9 predictors
summary.table.lm.9 <- readRDS("summary.table.lm.9.rds")

n <- 9

knitr::kable(summary.table.lm.9,digits = 3, caption = "LM with 9 predictors") %>%
  add_header_above(c("Model"=1,"Scaled LM" = 2, "LM" = 2, "Effect size" = 3),italic = T) %>%
  column_spec(column = 8, bold = T) %>% 
  kable_styling(bootstrap_options = "bordered") %>%
  group_rows(group_label = "Model Evaluation", start_row = n+1,end_row=n+3)
LM with 9 predictors
Model
Scaled LM
LM
Effect size
Parameter Estimate Std. Error Estimate Std. Error Estimate Std. Error Percent
NDVI 0.332 0.012 2.638 0.097 1.027 0.003 2.673
logRunoff 0.005 0.019 0.009 0.035 1.000 0.000 0.009
Bog 0.169 0.009 1.082 0.060 1.011 0.001 1.088
Arable 0.017 0.010 0.224 0.125 1.002 0.000 0.225
Forest 0.132 0.014 0.365 0.038 1.004 0.000 0.366
TNdep -0.059 0.038 0.000 0.000 0.997 0.002 -0.315
TSdep 0.004 0.042 0.000 0.000 1.000 0.002 0.021
Temp 0.433 0.020 0.135 0.006 1.001 0.000 0.136
logPrecip -0.384 0.019 -1.070 0.053 0.989 0.001 -1.059
Model Evaluation
AIC 6167.584 5458.379
R2 0.719 0.719
Moran’I res 0.072 0.072
s.lm.fennoscandia.9 <- readRDS("s.lm.fennoscandia.9.rds")
lm.fennoscandia.9 <- readRDS("lm.fennoscandia.9.rds")

f_plotspatial(data = fennoscandia,var = s.lm.fennoscandia.9$residuals, plottitle = "a) Residuals for scaled LM, with all predictors")

f_plotspatial(data = fennoscandia,var = lm.fennoscandia.9$residuals, plottitle = "b) Residuals for unscaled LM, with all predictors")

Figure 7: Map of residuals for a) scaled and b) unscaled LM with 9 predictors

3.1.2 Spatial error linear model on NSF, 9 predictors

k.neigh.w <- readRDS("k.neigh.w.rds")

fm.s <- s.logTOC~s.NDVI+s.logRunoff+s.Bog+s.Arable+s.Forest+s.TNdep+s.TSdep+s.Temp+s.logPrecip
s.sem.fen.9 <- errorsarlm(fm.s,fennoscandia,k.neigh.w)
saveRDS(s.sem.fen.9,"s.sem.fen.9.Rdata")

fm <- logTOC~NDVI+logRunoff+Bog+Arable+Forest+TNdep+TSdep+Temp+logPrecip
sem.fen.9 <- errorsarlm(fm,fennoscandia,k.neigh.w)
saveRDS(sem.fen.9,"sem.fen.9.Rdata")
s.sem.fen.9 <- readRDS("s.sem.fen.9.Rdata")
sem.fen.9 <- readRDS("sem.fen.9.Rdata")
k.neigh.w <- readRDS("k.neigh.w.rds")


s.sem.coef <- s.sem.fen.9$coefficients[-1] %>% as.data.frame() %>% 
              setNames("Estimate")
s.sem.coef <- s.sem.coef %>% rownames_to_column(var = "Parameter")
s.sem.coef$std.error <- s.sem.fen.9$rest.se[-1]
saveRDS(s.sem.coef, "s.sem.coef.9.rds")

sem.coef <- sem.fen.9$coefficients[-1] %>% as.data.frame() %>% 
              setNames("Estimate")
sem.coef <- sem.coef %>% rownames_to_column(var = "Parameter")
sem.coef$std.error <- sem.fen.9$rest.se[-1]


sem.effectsize <- compute.effect.size(sem.coef)
saveRDS(sem.effectsize, "sem.effectsize.9.rds")

n <- length(sem.coef$Parameter)

s.sem.AIC <- 2*(n-1)-2*s.sem.fen.9$LL
sem.AIC <- 2*(n-1)-2*sem.fen.9$LL

s.moran.I.res.sem <- moran.test(s.sem.fen.9$residuals, k.neigh.w, alternative = "two.sided")

moran.I.res.sem <- moran.test(sem.fen.9$residuals, k.neigh.w, alternative = "two.sided")

summary.table.sem <- cbind(dplyr::select(sem.coef,Parameter),s.sem.coef[,2:3],sem.coef[,2:3],sem.effectsize[,2:4]) %>%
                rbind(c("AIC",s.sem.AIC,"",sem.AIC,"","","","")) %>%
                rbind(c("Moran'I res", s.moran.I.res.sem$estimate[[1]],"", moran.I.res.sem$estimate[[1]], "","","",""))
summary.table.sem[,-1] <- apply(summary.table.sem[,-1],2,as.numeric)

saveRDS(summary.table.sem, "summary.table.sem.9.rds")
Table 4: Summary table for spatial error linear model with 9 predictors
summary.table.sem.9 <- readRDS("summary.table.sem.9.rds")

n <- 9

knitr::kable(summary.table.sem.9,digits = 3, caption = "SELM with 9 predictors") %>%
  add_header_above(c("Model"=1,"Scaled SELM" = 2, "SELM" = 2, "Effect size" = 3),italic = T) %>%
  kable_styling(bootstrap_options = "bordered") %>%
  column_spec(column = 8, bold = T) %>% 
  group_rows(group_label = "Model Evaluation", start_row = n+1,end_row=n+2)
SELM with 9 predictors
Model
Scaled SELM
SELM
Effect size
Parameter Estimate std.error Estimate std.error Estimate std.error Percent
NDVI 0.246 0.015 1.956 0.120 1.020 0.002 1.975
logRunoff -0.030 0.023 -0.053 0.042 0.999 0.000 -0.053
Bog 0.159 0.009 1.016 0.060 1.010 0.001 1.021
Arable 0.010 0.010 0.126 0.124 1.001 0.000 0.126
Forest 0.124 0.014 0.342 0.038 1.003 0.000 0.343
TNdep -0.092 0.079 0.000 0.000 0.995 0.004 -0.497
TSdep -0.063 0.077 0.000 0.000 0.997 0.004 -0.346
Temp 0.553 0.040 0.173 0.012 1.002 0.000 0.173
logPrecip -0.391 0.034 -1.090 0.096 0.989 0.001 -1.079
Model Evaluation
AIC 5850.232 5141.026
Moran’I res 0.007 0.007
s.sem.fen.9 <- readRDS("s.sem.fen.9.Rdata")
sem.fen.9 <- readRDS("sem.fen.9.Rdata")

f_plotspatial(data = fennoscandia,var = s.sem.fen.9$residuals, plottitle = "a) Residuals for scaled SEM, with all predictors")

f_plotspatial(data = fennoscandia,var = sem.fen.9$residuals, plottitle = "b) Residuals for unscaled SELM, with all predictors")

Figure 8: Map of residuals for a) scaled and b) unscaled SELM with 9 predictors

3.1.3 Comparative plots, 9 predictors

s.lm.coef.9 <- readRDS("s.lm.coef.9.rds")
s.sem.coef.9 <- readRDS("s.sem.coef.9.rds")

n <- c(names(s.sem.coef.9),"Model")
m <- cbind.data.frame(s.lm.coef.9[,1:3],Model = "LM") %>% setNames(n)
o <- cbind.data.frame(s.sem.coef.9,Model = "SELM")
  
scaled.df <- rbind(m,o)
scaled.df$Parameter <- factor(scaled.df$Parameter, levels = c("s.TNdep","s.TSdep","s.Forest","s.Arable","s.Bog","s.logRunoff","s.Temp","s.logPrecip","s.NDVI"))

scaled.plot.9 <- ggplot(scaled.df, aes(group = Model))+geom_col(aes(x=Estimate,y=Parameter, fill = Model), position = "dodge", width = 0.5)+ scale_fill_manual(values = c("#d73027","#4575b4"))+
  geom_errorbar(aes(y=Parameter,xmin = Estimate-std.error,xmax=Estimate+std.error), position = "dodge", width = 0.5)+
  labs(x = "Scaled Estimate", y = "Model with 9 predictors")+
  theme_bw(base_size = 8)+theme(legend.position = "bottom")
lm.effectsize.9 <- readRDS("lm.effectsize.9.rds")
sem.effectsize.9 <- readRDS("sem.effectsize.9.rds")

n <- c(names(sem.effectsize.9),"Model")
m <- cbind.data.frame(lm.effectsize.9,Model = "LM") %>% setNames(n)
o <- cbind.data.frame(sem.effectsize.9,Model = "SELM")
  
h.df <- rbind(m,o)
h.df$Parameter <- factor(h.df$Parameter, levels = c("TNdep","TSdep","Forest","Arable","Bog","logRunoff","Temp","logPrecip","NDVI"))

effectsize.plot.9 <- ggplot(h.df, aes(group = Model))+
  geom_col(aes(x=Percent,y=Parameter, fill = Model), position = "dodge", width = 0.5)+ scale_fill_manual(values = c("#d73027","#4575b4"))+
  geom_errorbar(aes(y=Parameter,xmin = Percent-(100*std.error),xmax=Percent+(100*std.error)), position = "dodge", width = 0.5)+
  labs(x = "Effect size in %", y = "")+
  theme_bw(base_size = 8)+theme(legend.position = "bottom")
compare.9 <- cowplot::plot_grid(plotlist = list(scaled.plot.9,effectsize.plot.9),ncol=2)
save_plot(plot = compare.9, filename = "compare.lm.selm.9.png")
print(compare.9)

Figure 9: Comparison of scaled estimates for LM and SELM (left) and of the effect sizes for each predictor (right) with 9 predictors

3.2 Spatial error linear model with NDVI, Bog and logRunoff

k.neigh.w <- readRDS("k.neigh.w.rds")
fennoscandia <- readRDS("fennoscandia.rds")

s.fm <- s.logTOC~s.NDVI+s.logRunoff+s.Bog
s.sem.fen.3 <- errorsarlm(s.fm,fennoscandia,k.neigh.w)
saveRDS(s.sem.fen.3,"s.sem.fen.3.rds")

fm <- logTOC~NDVI+logRunoff+Bog
sem.fen.3 <- errorsarlm(fm,fennoscandia,k.neigh.w)
saveRDS(sem.fen.3,"sem.fen.3.rds")
s.sem.fen.3 <- readRDS("s.sem.fen.3.rds")
sem.fen.3 <- readRDS("sem.fen.3.rds")
k.neigh.w <- readRDS("k.neigh.w.rds")

s.sem.coef.3 <- s.sem.fen.3$coefficients[-1] %>% as.data.frame() %>% 
              setNames("Estimate")
s.sem.coef.3 <- s.sem.coef.3 %>% rownames_to_column(var = "Parameter")
s.sem.coef.3$std.error <- s.sem.fen.3$rest.se[-1]
saveRDS(s.sem.coef.3, "s.sem.coef.3.rds")

sem.coef.3 <- sem.fen.3$coefficients[-1] %>% as.data.frame() %>% 
              setNames("Estimate")
sem.coef.3 <- sem.coef.3 %>% rownames_to_column(var = "Parameter")
sem.coef.3$std.error <- sem.fen.3$rest.se[-1]

sem.effectsize.fen.3 <- compute.effect.size(sem.coef.3)
saveRDS(sem.effectsize.fen.3, "sem.effectsize.fen.3.rds")


n <- length(sem.coef.3$Parameter)

s.sem.AIC.3 <- 2*(n-1)-2*s.sem.fen.3$LL
s.moran.I.res.sem.3 <- moran.test(s.sem.fen.3$residuals, k.neigh.w, alternative = "two.sided")

sem.AIC.3 <- 2*(n-1)-2*sem.fen.3$LL
moran.I.res.sem.3 <- moran.test(sem.fen.3$residuals, k.neigh.w, alternative = "two.sided")

summary.table.simple.sem.3 <- cbind(dplyr::select(sem.coef.3,Parameter),s.sem.coef.3[,2:3],sem.coef.3[,2:3],sem.effectsize.fen.3[,2:4]) %>%
                rbind(c("AIC",s.sem.AIC.3,"",sem.AIC.3,"","","","")) %>%
                rbind(c("Moran'I res", s.moran.I.res.sem.3$estimate[[1]],"",moran.I.res.sem.3$estimate[[1]], "","","",""))
summary.table.simple.sem.3[,-1] <- apply(summary.table.simple.sem.3[,-1],2,as.numeric)

saveRDS(summary.table.simple.sem.3, "summary.table.simple.sem.3.rds")
Table 5: Summary table for SELM model with 3 predictors
summary.table.simple.sem.3 <- readRDS("summary.table.simple.sem.3.rds")
n <- 3

knitr::kable(summary.table.simple.sem.3,digits = 3, caption = "SELM with 3 predictors (NDVI, Bog, Runoff") %>%
  add_header_above(c("Model"=1,"Scaled SELM" = 2,"SELM" = 2, "Effect size" = 3),italic = T) %>%
  kable_styling(bootstrap_options = "bordered") %>%
  column_spec(column = 8, bold = T) %>% 
  group_rows(group_label = "Model Evaluation", start_row = n+1,end_row=n+2)
SELM with 3 predictors (NDVI, Bog, Runoff
Model
Scaled SELM
SELM
Effect size
Parameter Estimate std.error Estimate std.error Estimate std.error Percent
NDVI 0.408 0.014 3.238 0.107 1.033 0.003 3.291
logRunoff -0.181 0.020 -0.325 0.037 0.997 0.000 -0.323
Bog 0.139 0.009 0.890 0.060 1.009 0.001 0.894
Model Evaluation
AIC 6246.702 5537.496
Moran’I res 0.008 0.008
s.sem.fennoscandia.3 <- readRDS("s.sem.fen.3.rds")
sem.fennoscandia.3 <- readRDS("sem.fen.3.rds")
fennoscandia <- readRDS("fennoscandia.rds")

f_plotspatial(data = fennoscandia,var = s.sem.fennoscandia.3$residuals, plottitle = "Residuals for scaled SEM, with NDVI, Runoff and Bogs")

f_plotspatial(data = fennoscandia,var = sem.fennoscandia.3$residuals, plottitle = "Residuals for unscaled SELM, with NDVI, Runoff and Bogs")

Figure 10: Map of residuals for a) scaled and b) unscaled SELM with 3 predictors
sem.coef.3 <- readRDS("s.sem.coef.3.rds")

scaled.plot.3 <- ggplot(s.sem.coef.3)+geom_col(aes(x=Estimate,y=Parameter), width = 0.5, fill = "#d73027")+
  geom_errorbar(aes(y=Parameter,xmin = Estimate-std.error,xmax=Estimate+std.error), width = 0.5)+
  labs(x = "Scaled estimates",y = "Predictor")+
  theme_bw(base_size = 8)+
  theme(legend.position = "bottom")
sem.effectsize.fen.3 <- readRDS("sem.effectsize.fen.3.rds")

effectsize.plot.3 <- ggplot(sem.effectsize.fen.3)+
  geom_col(aes(x=Percent,y=Parameter), width = 0.5, fill = "#4575b4")+
  geom_errorbar(aes(y=Parameter,xmin = Percent-(100*std.error),xmax=Percent+(100*std.error)), position = "dodge", width = 0.5)+
  labs(x = "Effect size in %", y = "")+
  theme_bw(base_size = 8)+theme(legend.position = "bottom")
compare.3 <- cowplot::plot_grid(plotlist = list(scaled.plot.3,effectsize.plot.3),ncol=2)
save_plot(plot = compare.3, filename = "compare.lm.selm.3.png" )
print(compare.3)

Figure 11: Comparison of scaled estimates for LM and SELM (left) and of the effect sizes for each predictor (right) with 9 predictors

3.3 Spatial error linear model with NDVI, Bog and logPrecip

k.neigh.w <- readRDS("k.neigh.w.rds")
fennoscandia <- readRDS("fennoscandia.rds")

s.fm <- s.logTOC~s.NDVI+s.logPrecip+s.Bog
s.sem.fen.precip <- errorsarlm(s.fm,fennoscandia,k.neigh.w)
saveRDS(s.sem.fen.precip,"s.sem.fen.precip.rds")

fm <- logTOC~NDVI+logPrecip+Bog
sem.fen.precip <- errorsarlm(fm,fennoscandia,k.neigh.w)
saveRDS(sem.fen.precip,"sem.fen.precip.rds")
s.sem.fen.precip <- readRDS("s.sem.fen.precip.rds")
sem.fen.precip <- readRDS("sem.fen.precip.rds")
k.neigh.w <- readRDS("k.neigh.w.rds")

s.sem.coef.precip <- s.sem.fen.precip$coefficients[-1] %>% as.data.frame() %>% 
              setNames("Estimate")
s.sem.coef.precip <- s.sem.coef.precip %>% rownames_to_column(var = "Parameter")
s.sem.coef.precip$std.error <- s.sem.fen.precip$rest.se[-1]
saveRDS(s.sem.coef.precip, "s.sem.coef.precip.rds")

sem.coef.precip <- sem.fen.precip$coefficients[-1] %>% as.data.frame() %>% 
              setNames("Estimate")
sem.coef.precip <- sem.coef.precip %>% rownames_to_column(var = "Parameter")
sem.coef.precip$std.error <- sem.fen.precip$rest.se[-1]

sem.effectsize.precip <- compute.effect.size(sem.coef.precip)
saveRDS(sem.effectsize.precip, "sem.effectsize.precip.rds")


n <- length(sem.coef.precip$Parameter)

s.sem.AIC.precip <- 2*(n-1)-2*s.sem.fen.precip$LL
s.moran.I.res.sem.precip <- moran.test(s.sem.fen.precip$residuals, k.neigh.w, alternative = "two.sided")

sem.AIC.precip <- 2*(n-1)-2*sem.fen.precip$LL
moran.I.res.sem.precip <- moran.test(sem.fen.precip$residuals, k.neigh.w, alternative = "two.sided")

summary.table.simple.sem.precip <- cbind(dplyr::select(sem.coef.precip,Parameter),s.sem.coef.precip[,2:3],sem.coef.precip[,2:3],sem.effectsize.precip[,2:4]) %>%
                rbind(c("AIC",s.sem.AIC.precip,"",sem.AIC.precip,"","","","")) %>%
                rbind(c("Moran'I res", s.moran.I.res.sem.precip$estimate[[1]],"",moran.I.res.sem.precip$estimate[[1]], "","","",""))
summary.table.simple.sem.precip[,-1] <- apply(summary.table.simple.sem.precip[,-1],2,as.numeric)

saveRDS(summary.table.simple.sem.precip, "summary.table.simple.precip.rds")
Table 6: Summary table for SELM model with 3 predictors: NDVI, Bog and logPrecip
summary.table.simple.precip <- readRDS("summary.table.simple.precip.rds")

n <- 3

knitr::kable(summary.table.simple.precip,digits = 3, caption = "SELM with 3 predictors (NDVI, Bog, logPrecip") %>%
  add_header_above(c("Model"=1,"Scaled SELM" = 2,"SELM" = 2, "Effect size" = 3),italic = T) %>%
  kable_styling(bootstrap_options = "bordered") %>%
  column_spec(column = 8, bold = T) %>% 
  group_rows(group_label = "Model Evaluation", start_row = n+1,end_row=n+2)
SELM with 3 predictors (NDVI, Bog, logPrecip
Model
Scaled SELM
SELM
Effect size
Parameter Estimate std.error Estimate std.error Estimate std.error Percent
NDVI 0.398 0.014 3.162 0.108 1.032 0.003 3.213
logPrecip -0.381 0.035 -1.062 0.098 0.989 0.001 -1.051
Bog 0.135 0.009 0.865 0.059 1.009 0.001 0.868
Model Evaluation
AIC 6209.113 5499.907
Moran’I res 0.010 0.010
s.sem.fen.precip <- readRDS("s.sem.fen.precip.rds")
sem.fen.precip <- readRDS("sem.fen.precip.rds")

f_plotspatial(data = fennoscandia,var = s.sem.fen.precip$residuals, plottitle = "a) Residuals for scaled SEM, with NDVI, logPrecip, Bogs")

f_plotspatial(data = fennoscandia,var = sem.fen.precip$residuals, plottitle = "b) Residuals for unscaled SELM, with NDVI, Precip, Bogs")

Figure 12: Map of residuals for SELM model with NDVI, log Precipitation and Bog as predictors, on a) scaled and b) unscaled variables
s.sem.coef.precip <- readRDS("s.sem.coef.precip.rds")

scaled.plot.precip <- ggplot(s.sem.coef.precip)+geom_col(aes(x=Estimate,y=Parameter), width = 0.5, fill = "#d73027")+
  geom_errorbar(aes(y=Parameter,xmin = Estimate-std.error,xmax=Estimate+std.error), width = 0.5)+
  labs(x = "Scaled estimates",y = "Predictor")+
  theme_bw(base_size = 8)+
  theme(legend.position = "bottom")
sem.effectsize.precip <- readRDS("sem.effectsize.precip.rds")

effectsize.plot.precip <- ggplot(sem.effectsize.precip)+
  geom_col(aes(x=Percent,y=Parameter), width = 0.5, fill = "#4575b4")+
  geom_errorbar(aes(y=Parameter,xmin = Percent-(100*std.error),xmax=Percent+(100*std.error)), position = "dodge", width = 0.5)+
  labs(x = "Effect size in %", y = "")+
  theme_bw(base_size = 8)+theme(legend.position = "bottom")
compare.precip <- cowplot::plot_grid(plotlist = list(scaled.plot.3,effectsize.plot.3),ncol=2)
save_plot(plot = compare.precip, filename = "plot.sem.fen.precip.png" )
print(compare.precip)

Figure 13: Comparison of scaled estimates and effect sizes from the limited SELM model with NDVI, log Precipitation and Bog as predictors.

3.4 Spatial error linear model with NDVI, TNdep and logPrecip

k.neigh.w <- readRDS("k.neigh.w.rds")
fennoscandia <- readRDS("fennoscandia.rds")

s.fm <- s.logTOC~s.NDVI+s.logPrecip+s.TNdep
s.sem.fen.tndep <- errorsarlm(s.fm,fennoscandia,k.neigh.w)
saveRDS(s.sem.fen.tndep,"s.sem.fen.tndep.rds")

fm <- logTOC~NDVI+logPrecip+TNdep
sem.fen.tndep <- errorsarlm(fm,fennoscandia,k.neigh.w)
saveRDS(sem.fen.tndep,"sem.fen.tndep.rds")
s.sem.fen.tndep <- readRDS("s.sem.fen.tndep.rds")
sem.fen.tndep <- readRDS("sem.fen.tndep.rds")
k.neigh.w <- readRDS("k.neigh.w.rds")

s.sem.coef.tndep <- s.sem.fen.tndep$coefficients[-1] %>% as.data.frame() %>% 
              setNames("Estimate")
s.sem.coef.tndep <- s.sem.coef.tndep %>% rownames_to_column(var = "Parameter")
s.sem.coef.tndep$std.error <- s.sem.fen.tndep$rest.se[-1]
saveRDS(s.sem.coef.tndep, "s.sem.coef.tndep.rds")

sem.coef.tndep <- sem.fen.tndep$coefficients[-1] %>% as.data.frame() %>% 
              setNames("Estimate")
sem.coef.tndep <- sem.coef.tndep %>% rownames_to_column(var = "Parameter")
sem.coef.tndep$std.error <- sem.fen.tndep$rest.se[-1]

sem.effectsize.fen.tndep <- compute.effect.size(sem.coef.tndep)
saveRDS(sem.effectsize.fen.tndep, "sem.effectsize.fen.tndep.rds")


n <- length(sem.coef.tndep$Parameter)

s.sem.AIC.tndep <- 2*(n-1)-2*s.sem.fen.tndep$LL
s.moran.I.res.sem.tndep <- moran.test(s.sem.fen.tndep$residuals, k.neigh.w, alternative = "two.sided")

sem.AIC.tndep <- 2*(n-1)-2*sem.fen.tndep$LL
moran.I.res.sem.tndep <- moran.test(sem.fen.tndep$residuals, k.neigh.w, alternative = "two.sided")

summary.table.sem.tndep <- cbind(dplyr::select(sem.coef.tndep,Parameter),s.sem.coef.tndep[,2:3],sem.coef.tndep[,2:3],sem.effectsize.fen.tndep[,2:4]) %>%
                rbind(c("AIC",s.sem.AIC.tndep,"",sem.AIC.tndep,"","","","")) %>%
                rbind(c("Moran'I res", s.moran.I.res.sem.tndep$estimate[[1]],"",moran.I.res.sem.tndep$estimate[[1]], "","","",""))
summary.table.sem.tndep[,-1] <- apply(summary.table.sem.tndep[,-1],2,as.numeric)

saveRDS(summary.table.sem.tndep, "summary.table.sem.tndep.rds")
Table 6: Summary table for SELM model with 3 predictors: NDVI, Bog and logPrecip
summary.table.sem.tndep <- readRDS("summary.table.sem.tndep.rds")
n <- 3

knitr::kable(summary.table.sem.tndep,digits = 3, caption = "SELM with 3 predictors (NDVI, TNdep, Precipitation") %>%
  add_header_above(c("Model"=1,"Scaled SELM" = 2,"SELM" = 2, "Effect size" = 3),italic = T) %>%
  kable_styling(bootstrap_options = "bordered") %>%
  column_spec(column = 8, bold = T) %>% 
  group_rows(group_label = "Model Evaluation", start_row = n+1,end_row=n+2)
SELM with 3 predictors (NDVI, TNdep, Precipitation
Model
Scaled SELM
SELM
Effect size
Parameter Estimate std.error Estimate std.error Estimate std.error Percent
NDVI 0.403 0.014 3.206 0.111 1.033 0.004 3.258
logPrecip -0.384 0.035 -1.069 0.096 0.989 0.001 -1.058
TNdep 0.168 0.029 0.000 0.000 1.009 0.002 0.907
Model Evaluation
AIC 6384.933 5675.727
Moran’I res 0.009 0.009
s.sem.fen.tndep <- readRDS("s.sem.fen.tndep.rds")
sem.fen.tndep <- readRDS("sem.fen.tndep.rds")

f_plotspatial(data = fennoscandia,var = s.sem.fen.tndep$residuals, plottitle = "a) Residuals for scaled SEM, with NDVI, Precipitation, TNdep")

f_plotspatial(data = fennoscandia,var = sem.fen.tndep$residuals, plottitle = "b) Residuals for unscaled SELM, with NDVI, Precipitation, TNdep")

Figure 12: Map of residuals for SELM model with NDVI, log Precipitation and Bog as predictors, on a) scaled and b) unscaled variables
s.sem.coef.tndep <- readRDS("s.sem.coef.tndep.rds")

scaled.plot.tndep <- ggplot(s.sem.coef.tndep)+geom_col(aes(x=Estimate,y=Parameter), width = 0.5, fill = "#d73027")+
  geom_errorbar(aes(y=Parameter,xmin = Estimate-std.error,xmax=Estimate+std.error), width = 0.5)+
  labs(x = "Scaled estimates",y = "Predictor")+
  theme_bw(base_size = 8)+
  theme(legend.position = "bottom")
sem.effectsize.fen.tndep <- readRDS("sem.effectsize.fen.tndep.rds")

effectsize.plot.tndep <- ggplot(sem.effectsize.fen.tndep)+
  geom_col(aes(x=Percent,y=Parameter), width = 0.5, fill = "#4575b4")+
  geom_errorbar(aes(y=Parameter,xmin = Percent-(100*std.error),xmax=Percent+(100*std.error)), position = "dodge", width = 0.5)+
  labs(x = "Effect size in %", y = "")+
  theme_bw(base_size = 8)+theme(legend.position = "bottom")
compare.tndep <- cowplot::plot_grid(plotlist = list(scaled.plot.tndep,effectsize.plot.tndep),ncol=2)
save_plot(plot = compare.tndep, filename = "compare.fen.tndep.png" )
print(compare.tndep)

Figure 14: Comparison of scaled estimates and effect sizes from the SELM model with NDVI, TNdep and log Precipitation as predictors.

4 References

knitr::knit_exit()
CORDEX. 2021. CORDEX — vERC.” https://portal.enes.org/data/enes-model-data/cordex.
Detsch, Florian. 2021. Download and Process GIMMS NDVI3g Data [R package gimms version 1.2.1].” Comprehensive R Archive Network (CRAN). https://cran.r-project.org/package=gimms.
EMEP. 2022. EMEP MSC-W HOME.” https://emep.int/mscw/mscw_moddata.html#Comp.
Henriksen, Arne, Brit Lisa Skjelvåle, Jaakko Mannio, Anders Wilander, Ron Harriman, Chris Curtis, Jens Peder Jensen, Erik Fjeld, and Tatyana Moiseenko. 1998. Northern European Lake Survey 1995: Finland, Norway, Sweden, Denmark, Russian Kola, Russian Karelia, Scotland and Wales.” Ambio 2 (27): 80–91. https://www.jstor.org/stable/4314692.
Hijmans, Robert J, Jacob Van Etten, Joe Cheng, Michael Sumner, Matteo Mattiuzzi, Jonathan A Greenberg, Oscar Perpinan, et al. 2018. Package ’raster’ Type Package Title Geographic Data Analysis and Modeling.” https://github.com/rspatial/raster/issues/ https://cran.r-project.org/package=raster.
Pebesma, E. 2021. Simple Features for R.” https://orcid.org/0000-0002-9415-4582 https://orcid.org/0000-0002-9415-4582%0Ahttps://orcid.org/0000-0003-2392-6140.
The National Center for Atmospheric Research. 2018. Global GIMMS NDVI3g v1 dataset (1981-2015).” A Big Earth Data Platform for Three Poles. http://poles.tpdc.ac.cn/en/data/9775f2b4-7370-4e5e-a537-3482c9a83d88/.
Tucker, Compton J, Jorge E Pinzon, Molly E Brown, Daniel A Slayback, Edwin W Pak, Robert Mahoney, Eric F Vermote, and Nazmi El Saleous. 2005. An extended AVHRR 8-km NDVI dataset compatible with MODIS and SPOT vegetation NDVI data.” International Journal of Remote Sensing 26 (20): 4485–98. https://doi.org/10.1080/01431160500168686.
Wickham, Hadley, Romain François, Lionel Henry, and Kirill Müller. 2020. Package ’dplyr’.” https://cran.r-project.org/web/packages/dplyr/dplyr.pdf.
WorldClim. n.d. Data format — WorldClim 1 documentation.” Accessed November 1, 2021. https://worldclim.org/data/v1.4/formats.html.